Gran Chaco
Annotating and Inferring Compositional Structures in Numeral Systems Across Languages
Rubehn, Arne, Rzymski, Christoph, Ciucci, Luca, van Dam, Kellen Parker, Kučerová, Alžběta, Bocklage, Katja, Snee, David, Stephen, Abishek, List, Johann-Mattis
Numeral systems across the world's languages vary in fascinating ways, both regarding their synchronic structure and the diachronic processes that determined how they evolved in their current shape. For a proper comparison of numeral systems across different languages, however, it is important to code them in a standardized form that allows for the comparison of basic properties. Here, we present a simple but effective coding scheme for numeral annotation, along with a workflow that helps to code numeral systems in a computer-assisted manner, providing sample data for numerals from 1 to 40 in 25 typologically diverse languages. We perform a thorough analysis of the sample, focusing on the systematic comparison between the underlying and the surface morphological structure. We further experiment with automated models for morpheme segmentation, where we find allomorphy as the major reason for segmentation errors. Finally, we show that subword tokenization algorithms are not viable for discovering morphemes in low-resource scenarios.
Indigenous Languages Spoken in Argentina: A Survey of NLP and Speech Resources
Ticona, Belu, Carranza, Fernando, Cotik, Viviana
Argentina has a large yet little-known Indigenous linguistic diversity, encompassing at least 40 different languages. The majority of these languages are at risk of disappearing, resulting in a significant loss of world heritage and cultural knowledge. Currently, unified information on speakers and computational tools is lacking for these languages. In this work, we present a systematization of the Indigenous languages spoken in Argentina, classifying them into seven language families: Mapuche, Tup\'i-Guaran\'i, Guaycur\'u, Quechua, Mataco-Mataguaya, Aymara, and Chon. For each one, we present an estimation of the national Indigenous population size, based on the most recent Argentinian census. We discuss potential reasons why the census questionnaire design may underestimate the actual number of speakers. We also provide a concise survey of computational resources available for these languages, whether or not they were specifically developed for Argentinian varieties.
- South America > Paraguay (0.15)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.06)
- South America > Argentina > Pampas > Buenos Aires F.D. > Buenos Aires (0.05)
- (26 more...)
Low-carbon milk to AI irrigation: tech startups powering Latin America's green revolution
Leo Prieto's passion for nature started during his childhood by the sea. "I was obsessed with what was under the surface. I'd anchor myself to a rock with my snorkel, and I was fascinated by all the little animals doing things that go unnoticed." His teenage years coincided with the arrival of the internet in Chile, where he became a web pioneer, launching and selling several startups. Inevitably, his interests in the environment, the internet and business merged, driven by the feeling that technological advances should not be wasted.
- North America > Central America (0.43)
- South America > Chile (0.26)
- North America > Honduras (0.08)
- (10 more...)
- Energy (1.00)
- Food & Agriculture > Agriculture (0.51)
Mapping of Land Use and Land Cover (LULC) using EuroSAT and Transfer Learning
Kunwar, Suman, Ferdush, Jannatul
As the global population continues to expand, the demand for natural resources increases. Unfortunately, human activities account for 23% of greenhouse gas emissions. On a positive note, remote sensing technologies have emerged as a valuable tool in managing our environment. These technologies allow us to monitor land use, plan urban areas, and drive advancements in areas such as agriculture, climate change mitigation, disaster recovery, and environmental monitoring. Recent advances in AI, computer vision, and earth observation data have enabled unprecedented accuracy in land use mapping. By using transfer learning and fine-tuning with RGB bands, we achieved an impressive 99.19% accuracy in land use analysis. Such findings can be used to inform conservation and urban planning policies.
- North America > United States (0.14)
- Asia > India (0.05)
- South America > Ecuador (0.04)
- (6 more...)
GlotLID: Language Identification for Low-Resource Languages
Kargaran, Amir Hossein, Imani, Ayyoob, Yvon, François, Schütze, Hinrich
Several recent papers have published good solutions for language identification (LID) for about 300 high-resource and medium-resource languages. However, there is no LID available that (i) covers a wide range of low-resource languages, (ii) is rigorously evaluated and reliable and (iii) efficient and easy to use. Here, we publish GlotLID-M, an LID model that satisfies the desiderata of wide coverage, reliability and efficiency. It identifies 1665 languages, a large increase in coverage compared to prior work. In our experiments, GlotLID-M outperforms four baselines (CLD3, FT176, OpenLID and NLLB) when balancing F1 and false positive rate (FPR). We analyze the unique challenges that low-resource LID poses: incorrect corpus metadata, leakage from high-resource languages, difficulty separating closely related languages, handling of macrolanguage vs varieties and in general noisy data. We hope that integrating GlotLID-M into dataset creation pipelines will improve quality and enhance accessibility of NLP technology for low-resource languages and cultures. GlotLID-M model, code, and list of data sources are available: https://github.com/cisnlp/GlotLID.
- Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
- South America > Peru > Huánuco Department > Huánuco Province > Huánuco (0.04)
- North America > Mexico > Puebla (0.04)
- (84 more...)
- Media > Television (0.45)
- Health & Medicine > Therapeutic Area > Neurology (0.33)
Multilingual Controllable Transformer-Based Lexical Simplification
Sheang, Kim Cheng, Saggion, Horacio
Text is by far the most ubiquitous source of knowledge and information and should be made easily accessible to as many people as possible; however, texts often contain complex words that hinder reading comprehension and accessibility. Therefore, suggesting simpler alternatives for complex words without compromising meaning would help convey the information to a broader audience. This paper proposes mTLS, a multilingual controllable Transformer-based Lexical Simplification (LS) system fined-tuned with the T5 model. The novelty of this work lies in the use of language-specific prefixes, control tokens, and candidates extracted from pre-trained masked language models to learn simpler alternatives for complex words. The evaluation results on three well-known LS datasets -- LexMTurk, BenchLS, and NNSEval -- show that our model outperforms the previous state-of-the-art models like LSBert and ConLS. Moreover, further evaluation of our approach on the part of the recent TSAR-2022 multilingual LS shared-task dataset shows that our model performs competitively when compared with the participating systems for English LS and even outperforms the GPT-3 model on several metrics. Moreover, our model obtains performance gains also for Spanish and Portuguese.
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.05)
- Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
- Asia > China > Hong Kong (0.04)
- (14 more...)
Labeled Bipolar Argumentation Frameworks
Escañuela Gonzalez, Melisa G. (Conasejo Nacional de Investigaciones Científicas y Técnicas (CONICET) - Universidad Nacional de Santiago del Estero (UNSE)) | Budán, Maximiliano C. D. | Simari, Gerardo I. (Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET) - Universidad Nacional del Sur (UNS)) | Simari, Guillermo R. (Universidad Nacional del Sur (UNS))
An essential part of argumentation-based reasoning is to identify arguments in favor and against a statement or query, select the acceptable ones, and then determine whether or not the original statement should be accepted. We present here an abstract framework that considers two independent forms of argument interaction--support and conflict--and is able to represent distinctive information associated with these arguments. This information can enable additional actions such as: (i) a more in-depth analysis of the relations between the arguments; (ii) a representation of the user's posture to help in focusing the argumentative process, optimizing the values of attributes associated with certain arguments; and (iii) an enhancement of the semantics taking advantage of the availability of richer information about argument acceptability. Thus, the classical semantic definitions are enhanced by analyzing a set of postulates they satisfy. Finally, a polynomial-time algorithm to perform the labeling process is introduced, in which the argument interactions are considered.
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- South America > Argentina > Gran Chaco > Santiago del Estero Province > Santiago del Estero (0.04)
- North America > United States > New York (0.04)
- (7 more...)
Bipolar in Temporal Argumentation Framework
Budán, Maximiliano C. D., Cobo, Maria Laura, Martinez, Diego C., Simari, Guillermo R.
A Timed Argumentation Framework (TAF) is a formalism where arguments are only valid for consideration in a given period of time, called availability intervals, which are defined for every individual argument. The original proposal is based on a single, abstract notion of attack between arguments that remains static and permanent in time. Thus, in general, when identifying the set of acceptable arguments, the outcome associated with a TAF will vary over time. In this work we introduce an extension of TAF adding the capability of modeling a support relation between arguments. In this sense, the resulting framework provides a suitable model for different time-dependent issues. Thus, the main contribution here is to provide an enhanced framework for modeling a positive (support) and negative (attack) interaction varying over time, which are relevant in many real-world situations. This leads to a Timed Bipolar Argumentation Framework (T-BAF), where classical argument extensions can be defined. The proposal aims at advancing in the integration of temporal argumentation in different application domain.
- South America > Argentina > Pampas > Buenos Aires F.D. > Buenos Aires (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- South America > Argentina > Gran Chaco > Santiago del Estero Province > Santiago del Estero (0.04)
- Research Report (0.50)
- Summary/Review (0.46)
Dealing with Qualitative and Quantitative Features in Legal Domains
Budán, Maximiliano C. D., Cobo, María Laura, Martínez, Diego I., Rotolo, Antonino
In this work, we enrich a formalism for argumentation by including a formal characterization of features related to the knowledge, in order to capture proper reasoning in legal domains. We add meta-data information to the arguments in the form of labels representing quantitative and qualitative data about them. These labels are propagated through an argumentative graph according to the relations of support, conflict, and aggregation between arguments.
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- South America > Argentina > Gran Chaco > Santiago del Estero Province > Santiago del Estero (0.04)
- Europe > Italy > Emilia-Romagna > Metropolitan City of Bologna > Bologna (0.04)
An Approach to Characterize Graded Entailment of Arguments through a Label-based Framework
Budán, Maximiliano C. D., Simari, Gerardo I., Viglizzo, Ignacio, Simari, Guillermo R.
Argumentation theory is a powerful paradigm that formalizes a type of commonsense reasoning that aims to simulate the human ability to resolve a specific problem in an intelligent manner. A classical argumentation process takes into account only the properties related to the intrinsic logical soundness of an argument in order to determine its acceptability status. However, these properties are not always the only ones that matter to establish the argument's acceptability---there exist other qualities, such as strength, weight, social votes, trust degree, relevance level, and certainty degree, among others.
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- South America > Argentina > Gran Chaco > Santiago del Estero Province > Santiago del Estero (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (6 more...)
- Law (1.00)
- Government (0.92)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- (4 more...)